Skip to content

Conversation

@grokspawn
Copy link

@grokspawn grokspawn commented Oct 29, 2025

What this PR does / why we need it:

OLM v1 is the future of operator lifecycle management in Kubernetes. This
plugin:

  • Provides a smooth experience for managing ClusterExtensions with
    intelligent defaults
  • Handles the complexity of RBAC requirements transparently
  • Auto-detects and enables webhook support when needed
  • Simplifies troubleshooting with comprehensive status reporting
  • Makes OLM v1 accessible through natural language commands

Technical Highlights

Automatic Webhook Support

When installing an operator that requires webhooks (like CloudNativePG,
cert-manager):

  1. Detects "unsupported bundle: webhookDefinitions are not supported" error
  2. Checks if cert-manager is installed
  3. Automatically enables WebhookProviderCertManager feature gate
  4. Adds cert-manager.io RBAC permissions
  5. Triggers reconciliation to complete installation

RBAC Preflight Integration

When PreflightPermissions feature gate is enabled:

  1. Parses "pre-authorization failed" messages
  2. Extracts missing permissions (Namespace, APIGroups, Resources, Verbs)
  3. Updates ClusterRole with missing rules
  4. Waits for automatic retry and installation completion

Comprehensive Baseline RBAC

The /olmv1:install command creates ClusterRoles with permissions for:

  • Core resources (pods, services, configmaps, secrets, PVCs)
  • Apps resources (deployments, statefulsets, daemonsets)
  • RBAC management (with bind/escalate for operator-created roles)
  • CRDs and finalizers
  • OLM resources (ClusterExtension finalizers for blockOwnerDeletion)
  • Admission webhooks
  • cert-manager resources (when webhooks are used)
  • Coordination (leases for leader election)

Special notes for your reviewer:

  • All commands follow the pattern established in the OLM v0 plugin (PR Add OLM Plugin for Day-2 Operator Management #70)
  • Webhook support auto-enablement uses kubectl patch
  • The plugin handles both cert-manager (Kubernetes) and OpenShift Service CA
    webhook providers
  • RBAC baseline includes lessons learned from real-world installations
    (postgres-operator, cloudnative-pg)
  • Feature gate detection helps users understand cluster capabilities

Checklist:

  • Subject and description added to both commit and PR
  • Comprehensive README with examples and key concepts
  • All commands include detailed step-by-step instructions
  • Error handling for webhook, RBAC, and compatibility issues
  • Example outputs for success and failure scenarios

Background:

OLM v1 Plugin for ClusterExtension Management

This PR introduces a new plugin for managing Kubernetes extensions using OLM
v1 (operator-controller), the next-generation operator lifecycle management
system that provides a simpler, more flexible approach to managing cluster
extensions.

What is OLM v1?

OLM v1 (operator-controller) is the successor to OLM v0, offering:

  • Simpler API surface: ClusterExtension and ClusterCatalog instead of
    Subscriptions, CSVs, InstallPlans
  • No cluster-admin privileges: Admins provide ServiceAccounts with explicit
    RBAC permissions
  • Better declarative management: Single resource to manage extension
    lifecycle
  • Broader extension support: Not just operators - Helm charts and other
    extension types

Commands

Core Operations:

  • /olmv1:search - Search and discover extensions across cluster
    catalogs with version/channel information
  • /olmv1:install - Install extensions with automatic RBAC
    setup, webhook support detection, and iterative permission fixes
  • /olmv1:list - View all installed ClusterExtensions with status and health
    indicators
  • /olmv1:status - Detailed health status, deployment info,
    CRDs, and webhook configuration
  • /olmv1:uninstall - Safely remove extensions with
    automatic cleanup of namespaces and RBAC resources

Update Management:

  • /olmv1:upgrade - Update extensions to specific versions,
    version ranges, or channels

Advanced Features:

  • /olmv1:fix-rbac - Automatically analyze and fix RBAC
    permission issues
  • /olmv1:catalog-add - Add new ClusterCatalog sources
  • /olmv1:catalog-list - List available ClusterCatalogs and their health
    status

Key Features

Intelligent RBAC Management

  • Automatic ServiceAccount creation with baseline permissions
  • Iterative RBAC fixes: Detects missing permissions and automatically
    updates ClusterRole
  • PreflightPermissions support: Shows detailed permission errors before
    installation attempts
  • OLM resource permissions: Includes finalizers and ClusterExtension
    management rights

Webhook Support with Auto-Configuration

  • Feature gate detection: Checks if WebhookProviderCertManager or
    WebhookProviderOpenshiftServiceCA is enabled
  • Automatic enablement: Detects cert-manager and can automatically enable
    webhook support
  • cert-manager RBAC: Automatically adds required permissions for Certificate
    resources
  • Clear error messages: Explains webhook requirements and provides fix
    commands

Comprehensive Health Monitoring

  • Multi-layer status: ClusterExtension conditions, deployment health, pod
    status
  • CRD tracking: Lists all CRDs created by extensions
  • Webhook validation: Verifies ValidatingWebhookConfigurations and
    MutatingWebhookConfigurations
  • Event aggregation: Shows recent events from extension namespace

Safe Operations

  • Confirmation prompts: Shows what will be deleted before uninstalling
  • Namespace cleanup: Automatically removes namespaces and RBAC when safe
  • Dependency detection: Warns about CRDs and custom resources that will be
    removed

Example Workflow

Basic installation:

Search for an extension

/olmv1:search postgres

Install CloudNativePG (automatically handles webhooks and RBAC)

/olmv1:install cloudnative-pg

Check installation status

/olmv1:status cloudnative-pg

List all installed extensions

/olmv1:list

Version management:

Install specific version

/olmv1:install cert-manager --version ">=1.14.0 <2.0.0"

Upgrade to new version

/olmv1:upgrade cert-manager --version "1.15.0"

Troubleshooting:

Check status with feature gate information

/olmv1:status my-operator

Fix RBAC permissions automatically

/olmv1:fix-rbac my-operator

Uninstall and cleanup

/olmv1:uninstall my-operator

Catalog management:

List available catalogs

/olmv1:catalog-list

Add custom catalog

/olmv1:catalog-add my-catalog quay.io/my-org/my-catalog:latest

Critical Differences from OLM v0

Aspect OLM v0 OLM v1
Main Resource Subscription + CSV ClusterExtension
Catalog CatalogSource ClusterCatalog
RBAC Cluster-admin by default Explicit ServiceAccount required
Webhooks Always supported Requires feature gate
Permission Errors Generic failures Detailed preflight checks (with feature gate)
API Complexity Nine resources Two declarative resources

Assisted-by: claude

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 29, 2025
@openshift-ci
Copy link

openshift-ci bot commented Oct 29, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: grokspawn
Once this PR has been reviewed and has the lgtm label, please assign dgoodwin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link

openshift-ci bot commented Oct 29, 2025

Hi @grokspawn. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 29, 2025
ability to diagnose/repair common installation issues, and a
least-privilege mode of defining clusterextension service accounts.
Extends knowledge of current feature gates to facilitate additional
capabilities (e.g. webhooks) and provide greater insight into RBAC/SA
failures.

Signed-off-by: grokspawn <[email protected]>
Assisted-by: claude
@grokspawn grokspawn changed the title first claude-pass on an olmv1 claude plugin for managing clusterextensions an olmv1 claude plugin for managing clusterextensions Oct 29, 2025
@grokspawn
Copy link
Author

I have this structured to be symmetric with #54 for commands, and it currently works with both 4.18 GA and 4.21 TP content.
I'll hold off on further development here while we discuss how to converge the multiple efforts underway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant